candidate function
Speculative Automated Refactoring of Imperative Deep Learning Programs to Graph Execution
Khatchadourian, Raffi, Vélez, Tatiana Castro, Bagherzadeh, Mehdi, Jia, Nan, Raja, Anita
Efficiency is essential to support ever-growing datasets, especially for Deep Learning (DL) systems. DL frameworks have traditionally embraced deferred execution-style DL code -- supporting symbolic, graph-based Deep Neural Network (DNN) computation. While scalable, such development is error-prone, non-intuitive, and difficult to debug. Consequently, more natural, imperative DL frameworks encouraging eager execution have emerged but at the expense of run-time performance. Though hybrid approaches aim for the "best of both worlds," using them effectively requires subtle considerations. Our key insight is that, while DL programs typically execute sequentially, hybridizing imperative DL code resembles parallelizing sequential code in traditional systems. Inspired by this, we present an automated refactoring approach that assists developers in determining which otherwise eagerly-executed imperative DL functions could be effectively and efficiently executed as graphs. The approach features novel static imperative tensor and side-effect analyses for Python. Due to its inherent dynamism, analyzing Python may be unsound; however, the conservative approach leverages a speculative (keyword-based) analysis for resolving difficult cases that informs developers of any assumptions made. The approach is: (i) implemented as a plug-in to the PyDev Eclipse IDE that integrates the WALA Ariadne analysis framework and (ii) evaluated on nineteen DL projects consisting of 132 KLOC. The results show that 326 of 766 candidate functions (42.56%) were refactorable, and an average relative speedup of 2.16x on performance tests was observed with negligible differences in model accuracy. The results indicate that the approach is useful in optimizing imperative DL code to its full potential.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Iowa (0.04)
- Europe > Italy > Friuli Venezia Giulia > Trieste Province > Trieste (0.04)
A Dimension-Decomposed Learning Framework for Online Disturbance Identification in Quadrotor SE(3) Control
Quadrotor stability under complex dynamic disturbances and model uncertainties poses significant challenges. One of them remains the underfitting problem in high-dimensional features, which limits the identification capability of current learning-based methods. To address this, we introduce a new perspective: Dimension-Decomposed Learning (DiD-L), from which we develop the Sliced Adaptive-Neuro Mapping (SANM) approach for geometric control. Specifically, the high-dimensional mapping for identification is axially ``sliced" into multiple low-dimensional submappings (``slices"). In this way, the complex high-dimensional problem is decomposed into a set of simple low-dimensional tasks addressed by shallow neural networks and adaptive laws. These neural networks and adaptive laws are updated online via Lyapunov-based adaptation without any pre-training or persistent excitation (PE) condition. To enhance the interpretability of the proposed approach, we prove that the full-state closed-loop system exhibits arbitrarily close to exponential stability despite multi-dimensional time-varying disturbances and model uncertainties. This result is novel as it demonstrates exponential convergence without requiring pre-training for unknown disturbances and specific knowledge of the model.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > California > Santa Clara County > Palo Alto (0.14)
- North America > United States > California > San Diego County > La Jolla (0.04)
- (3 more...)
Data-driven Discovery of Digital Twins in Biomedical Research
Métayer, Clémence, Ballesta, Annabelle, Martinelli, Julien
Recent technological advances have expanded the availability of high-throughput biological datasets, enabling the reliable design of digital twins of biomedical systems or patients. Such computational tools represent key reaction networks driving perturbation or drug response and can guide drug discovery and personalized therapeutics. Yet, their development still relies on laborious data integration by the human modeler, so that automated approaches are critically needed. The success of data-driven system discovery in Physics, rooted in clean datasets and well-defined governing laws, has fueled interest in applying similar techniques in Biology, which presents unique challenges. Here, we reviewed methodologies for automatically inferring digital twins from biological time series, which mostly involve symbolic or sparse regression. We evaluate algorithms according to eight biological and methodological challenges, associated to noisy/incomplete data, multiple conditions, prior knowledge integration, latent variables, high dimensionality, unobserved variable derivatives, candidate library design, and uncertainty quantification. Upon these criteria, sparse regression generally outperformed symbolic regression, particularly when using Bayesian frameworks. We further highlight the emerging role of deep learning and large language models, which enable innovative prior knowledge integration, though the reliability and consistency of such approaches must be improved. While no single method addresses all challenges, we argue that progress in learning digital twins will come from hybrid and modular frameworks combining chemical reaction network-based mechanistic grounding, Bayesian uncertainty quantification, and the generative and knowledge integration capacities of deep learning. To support their development, we further propose a benchmarking framework to evaluate methods across all challenges.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > France (0.04)
- (3 more...)
- Research Report (0.81)
- Workflow (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- (5 more...)
Data-Driven Discovery and Formulation Refines the Quasi-Steady Model of Flapping-Wing Aerodynamics
Kamimizu, Yu, Liu, Hao, Nakata, Toshiyuki
Insects control unsteady aerodynamic forces on flapping wings to navigate complex environments. While understanding these forces is vital for biology, physics, and engineering, existing evaluation methods face trade-offs: high-fidelity simulations are computationally or experimentally expensive and lack explanatory power, whereas theoretical models based on quasi-steady assumptions offer insights but exhibit low accuracy. To overcome these limitations and thus enhance the accuracy of quasi-steady aerodynamic models, we applied a data-driven approach involving discovery and formulation of previously overlooked critical mechanisms. Through selection from 5,000 candidate kinematic functions, we identified mathematical expressions for three key additional mechanisms -- the effect of advance ratio, effect of spanwise kinematic velocity, and rotational Wagner effect -- which had been qualitatively recognized but were not formulated. Incorporating these mechanisms considerably reduced the prediction errors of the quasi-steady model using the computational fluid dynamics results as the ground truth, both in hawkmoth forward flight (at high Reynolds numbers) and fruit fly maneuvers (at low Reynolds numbers). The data-driven quasi-steady model enables rapid aerodynamic analysis, serving as a practical tool for understanding evolutionary adaptations in insect flight and developing bio-inspired flying robots.
- North America > United States (0.04)
- Asia > Japan (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- (2 more...)
- Health & Medicine (1.00)
- Transportation > Air (0.87)
Leveraging Large Language Models for Command Injection Vulnerability Analysis in Python: An Empirical Study on Popular Open-Source Projects
Wang, Yuxuan, Chen, Jingshu, Wang, Qingyang
Command injection vulnerabilities are a significant security threat in dynamic languages like Python, particularly in widely used open-source projects where security issues can have extensive impact. With the proven effectiveness of Large Language Models(LLMs) in code-related tasks, such as testing, researchers have explored their potential for vulnerabilities analysis. This study evaluates the potential of large language models (LLMs), such as GPT-4, as an alternative approach for automated testing for vulnerability detection. In particular, LLMs have demonstrated advanced contextual understanding and adaptability, making them promising candidates for identifying nuanced security vulnerabilities within code. To evaluate this potential, we applied LLM-based analysis to six high-profile GitHub projects-Django, Flask, TensorFlow, Scikit-learn, PyTorch, and Langchain-each with over 50,000 stars and extensive adoption across software development and academic research. Our analysis assesses both the strengths and limitations of LLMs in detecting command injection vulnerabilities, evaluating factors such as detection accuracy, efficiency, and practical integration into development workflows. In addition, we provide a comparative analysis of different LLM tools to identify those most suitable for security applications. Our findings offer guidance for developers and security researchers on leveraging LLMs as innovative and automated approaches to enhance software security.
- North America > United States > Louisiana > East Baton Rouge Parish > Baton Rouge (0.14)
- South America > Brazil > Mato Grosso do Sul > Campo Grande (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (2 more...)
A Bayesian Approach for Discovering Time- Delayed Differential Equation from Data
Chowdhury, Debangshu, Chakraborty, Souvik
Time-delayed differential equations (TDDEs) are widely used to model complex dynamic systems where future states depend on past states with a delay. However, inferring the underlying TDDEs from observed data remains a challenging problem due to the inherent nonlinearity, uncertainty, and noise in real-world systems. Conventional equation discovery methods often exhibit limitations when dealing with large time delays, relying on deterministic techniques or optimization-based approaches that may struggle with scalability and robustness. In this paper, we present BayTiDe - Bayesian Approach for Discovering Time-Delayed Differential Equations from Data, that is capable of identifying arbitrarily large values of time delay to an accuracy that is directly proportional to the resolution of the data input to it. BayTiDe leverages Bayesian inference combined with a sparsity-promoting discontinuous spike-and-slab prior to accurately identify time-delayed differential equations. The approach accommodates arbitrarily large time delays with accuracy proportional to the input data resolution, while efficiently narrowing the search space to achieve significant computational savings. We demonstrate the efficiency and robustness of BayTiDe through a range of numerical examples, validating its ability to recover delayed differential equations from noisy data.
Sparse Identification of Nonlinear Dynamics-based Model Predictive Control for Multirotor Collision Avoidance
Lee, Jayden Dongwoo, Kim, Youngjae, Kim, Yoonseong, Bang, Hyochoong
This paper proposes a data-driven model predictive control for multirotor collision avoidance considering uncertainty and an unknown model from a payload. To address this challenge, sparse identification of nonlinear dynamics (SINDy) is used to obtain the governing equation of the multirotor system. The SINDy can discover the equations of target systems with low data, assuming that few functions have the dominant characteristic of the system. Model predictive control (MPC) is utilized to obtain accurate trajectory tracking performance by considering state and control input constraints. To avoid a collision during operation, MPC optimization problem is again formulated using inequality constraints about an obstacle. In simulation, SINDy can discover a governing equation of multirotor system including mass parameter uncertainty and aerodynamic effects. In addition, the simulation results show that the proposed method has the capability to avoid an obstacle and track the desired trajectory accurately.
- Transportation (1.00)
- Energy > Oil & Gas > Upstream (0.83)
SymbolFit: Automatic Parametric Modeling with Symbolic Regression
Tsoi, Ho Fung, Rankin, Dylan, Caillol, Cecile, Cranmer, Miles, Dasu, Sridhara, Duarte, Javier, Harris, Philip, Lipeles, Elliot, Loncar, Vladimir
We introduce SymbolFit, a framework that automates parametric modeling by using symbolic regression to perform a machine-search for functions that fit the data, while simultaneously providing uncertainty estimates in a single run. Traditionally, constructing a parametric model to accurately describe binned data has been a manual and iterative process, requiring an adequate functional form to be determined before the fit can be performed. The main challenge arises when the appropriate functional forms cannot be derived from first principles, especially when there is no underlying true closed-form function for the distribution. In this work, we address this problem by utilizing symbolic regression, a machine learning technique that explores a vast space of candidate functions without needing a predefined functional form, treating the functional form itself as a trainable parameter. Our approach is demonstrated in data analysis applications in high-energy physics experiments at the CERN Large Hadron Collider (LHC). We demonstrate its effectiveness and efficiency using five real proton-proton collision datasets from new physics searches at the LHC, namely the background modeling in resonance searches for high-mass dijet, trijet, paired-dijet, diphoton, and dimuon events. We also validate the framework using several toy datasets with one and more variables.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Pennsylvania (0.04)
- (5 more...)
LES-SINDy: Laplace-Enhanced Sparse Identification of Nonlinear Dynamical Systems
The discovery of scientific laws from measurements is a significant intellectual milestone, and its motivation arises from the widespread occurrence of nonlinear dynamical systems in science and engineering. Understanding the governing equations, which often take the form of ordinary differential equations (ODEs), partial differential equations (PDEs), and stochastic differential equations (SDEs), is essential for accurate prediction, effective control, and informed decision-making [1, 2]. In many complex systems, the underlying dynamics remain poorly understood, which renders conventional modeling techniques based on first principles both challenging and, at times, intractable. To tackle the challenge of model discovery in dynamical systems, Sparse Identification of Nonlinear Dynamics (SINDy) [3] offers a data-driven solution. By the use of given measurements, SINDy constructs parsimonious models that capture the essential features of system dynamics without the need for detailed knowledge of the underlying physics. The strength of SINDy lies in its ability to identify sparse and interpretable models, based on the assumption that the system's dynamics can be represented as a sparse linear combination of candidate functions. This process involves iterative optimization through sparse regression [4] and the selection of the most relevant terms from a comprehensive library, which enables the discovery of governing equations that are both accurate and physically meaningful. Building on the idea of using sparse regression techniques to discover nonlinear dynamical systems, extensive research has been conducted to enhance the SINDy framework for various objectives or to apply it across diverse domains.
- North America > United States (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Information Technology > Scientific Computing (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)